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ARSTRACT 

The  efficiencies  of  various  data  compression  techniques  as 
applied  to  color  maps  are  compared.  These  color  maps  have 
certain  special  characteristics  such  as  large  homogeneous 
regions  trnd  fine  detail  such  as  lines  and  lettering.  The  color 
maps  are  first  classified  using  the  K  means  clustering 
algorithm  with  neighborhood  classification.  Three  techniques 
arc  investigated  -  contour,  quadtree  and  run-length  coding. 

The  run-length  coding  algorithm  is  modified  to  allow  wrap 
around  of  runs.  A  modification  of  the  standard  binary  image 
quadtree  compression  algorithm  for  color  images  is  introduced. 
In  quadtree  coding  a  modified  eldest-son  eldest  younger  sibling 
quadtree  is  used  to  reduce  memory  requirement  in  storing  the 
quadtree.  Lempel-Ziv  compression  is  applied  to  tile  classified 
and  unclassified  images  as  also  to  the  output  of  the 
compression  algorithms.  The  algorithms  will  be  compared  on 
the  compression  ratios  achieved. 


INTRODUCTION 

This  paper  presents  a  comparative  evaluation  of  various 
compression  techniques  as  applied  to  color  maps.  The  maps 
have  certain  specific  characteristics  such  as  large  homogeneous 
regions  and  fine  detail.  The  compression  techniques  being 
considered,  deliver  good  compression  only  when  the  images 
have  large  homogeneous  regions.  The  maps  are  classified 
using  a  neighborhood  classification  algorithm  before  the 
compression  algorithms  are  applied  to  the  images.  Run-length, 
contour  and  quadtree  coding  are  investigated. 

The  maps  in  question  are  digitized  versions  of  printed  maps. 
Due  to  the  nature  of  the  printing  of  the  maps,  the  images  end 
up  having  a  grainy  salt  and  pepper  character  with  each  pixel  a 
different  color.  The  digitization  process  introduces  some 
inhomogeneity  in  the  coloring  because  of  the  printing,  which 
involves  half-toning.  Further,  the  maps  have  map  to  map 
variation  in  coloring  due  to  the  varying  ages  and  printing 
techniques  of  the  maps.  The  result  of  all  this  is  that  the  maps  do 
not  have  uniform  coloring.  The  maps  under  consideration  also 
have  fine  detail  such  as  latitude  and  longitude  lines  and 
lettering.  These  characteristics  place  a  bound  on  the  degree  of 
compression  that  can  be  achieved  with  the  compression 
techniques. 

The  maps  are  classified  to  overcome  the  limitation  due  to 
inhomogeneity  in  the  maps.  The  K  means  clustering 
algorithm  was  used  to  classify  the  image  into  a  much  smaller 
number  of  classes  (8  as  compared  to  the  256  colors  present  in 
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the  original  images)[l,3].  The  maps  are  also  classified  using 
vector  quantization  with  differing  number  of  codewords  and 
various  distance  criteria[6,8].  Neighborhood  classification  was 
applied  in  order  to  improve  the  quality  of  the  image. 

The  presence  of  large  homogeneous  regions  in  the  maps 
suggests  that  techniques  such  as  quadtree  and  run-length 
coding  would  be  applicable.  Quadtree  compression  has  been 
applied  to  binary  images  and  has  been  shown  to  give  good 
compressionfll],  In  [15]  the  authors  have  applied  quadtree 
compression  to  the  compression  of  line  drawing  map  overlays. 
A  512  x  512  image  could  be  stored  in  as  few  as  130  nodes.  A 
modified  version  of  the  standard  quadtree  algorithm, 
applicable  to  color  images  is  proposed  and  evaluated. 
Run-length  and  contour  coding  have  been  the  subject  of 
extensive  research.  The  run-length  coding  algorithm  of  [2]  and 
contour  coding  algorithm  of  [16]  are  applied  to  color  map 
compression.  Since  the  images  have  large  homogeneous 
regions,  the  runs  are  allowed  to  wrap  around  from  one  row  to 
another.  Before  applying  the  compression  algorithms  the 
images  are  converted  from  24  bits/pixel  to  8  bits/pixel 
pseudo-color  images.  The  image  files  now  consist  of  a  gray 
level  image  file  and  a  color  look  up  table.  Thus  compression 
of  3:1  is  already  achieved  before  any  compression  techniques 
are  applied. 


COMPRESSION  TECHNIQUES 
Quadtree  Coding 

Quadtree  coding  as  applied  to  color  maps  is  discussed  in  the 
following  section.  Quadtree  compression  for  binary  images 
has  been  the  subject  of  substantial  research[l  1  J.  The  presence 
of  large  homogeneous  regions  in  the  maps,  together  with  the 
small  number  of  colors  led  to  the  choice  of  quadtree 
compression.  A  variant  of  the  standard  quadtree  algorithm  as 
applied  to  binary  images  is  used.  The  rooted  tree  approach  to  a 
quadtree  is  used  to  minimize  the  memory  requirement  in 
storing  the  quadtree. 

The  efficiency  of  quadtrees  in  image  processing  applications  is 
based  on  the  principle  of  recursive  decomposition]!  1], 
Recursive  decomposition  of  a  picture  is  performed  by 
successively  decomposing  the  image  into  smaller  and  smaller 
regular  polygons.  These  regular  polygons  or  tilings  of  the 
image  plane  arc  referred  to  as  tessellations.  Typical 
tessellations  are  squares,  equilateral  triangles,  hexagons  and 
isosceles  right  triangles.  Due  to  the  presence  of  large 
homogeneous  regions  in  the  image,  the  feasibility  of  using 
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POINTER  TO  PARENT 


Figure  1.  Conventional  quadtree  representation. 


octrees  instead  of  quadtrees  was  explored.  Since  octagonal 
tessellations  are  not  unlimited,  this  is  not  feasible  and  square 
tessellations  were  chosen. 

The  quadtree  can  be  considered  as  a  variant  of  a  binary  tree 
with  each  parent  having  four  sons  (Figure  1).  Typically,  each 
node  of  a  quadtree  consists  of  a  value  element,  pointers  to  the 
4  children,  a  pointer  to  the  parent  and  a  pointer  to  the  sibling  - 
a  total  of  6  pointers  plus  one  value  element  per  node.  The 
amount  of  memory  used  to  set  up  the  quadtree  is  reduced  by 
using  only  three  pointers  per  node.  The  structure  used  is  called 
a  rooted  tree  [7,17],  We  store  a  rooted  tree  using  the  eldest 
child  -  eldest  younger  sibling  method  illustrated  in  Figure  2. 
Each  node  of  the  tree  consists  of  one  value  (color)  element,  one 
son  pointer  which  points  to  the  eldest  of  the  nodes  four 
children,  a  sibling  pointer  which  points  to  the  next  youngest 
sibling  and  a  parent  pointer  which  points  to  the  parent  of  the 
node.  Pointers  are  set  to  NIL  if  no  nodes  exist  to  which  they 
are  to  point. 

In  addition  modifications  have  also  to  be  made  to  the  binary 
image  algorithm,  to  account  for  the  larger  number  of  colors  in 
the  image.  The  classified  image  now  has  one  of  eight  possible 
colors.  The  colors  are  of  two  types  :  pure  colors,  which  occur 
in  the  image  and  a  shade  called  GRAY.  The  shade  GRAY  is 
used  in  the  quadtree  compression  algorithm  to  indicate  that  a 
particular  node  has  sons  of  differing  pure  colors. 

In  order  to  permit  ease  in  computation  and  reconstruction  of 
the  image  a  numbering  scheme  for  the  nodes  of  the  quadtree 
is  introduced.  The  root  node  has  a  number  0.  Further  levels  are 
divided  into  blocks  ok  2  x  2  and  then  numbered  in  a  clockwise 
ascending  order.  This  numbering  scheme  is  illustrated  in 
Figure  3.  Also  the  input  image  had  a  size  of  575  x  639.  This 
odd  number  of  elements  would  result  in  a  major  handling 
overhead.  The  quadtree  method  works  best  when  the  size  of 

the  image  is  of  the  form  2n  x  2n,  where  n  is  an  integer.  The 
bottom  most  level  of  the  quadtree  was  assumed  to  be  of  size 

2^  x  2^.  The  image  was  then  read  into  the  top  left  hand 
comer  of  the  bottom  level. 

Operation  of  quadtree  compression  is  explained  below: 


Figure  2.  Eldest  son-eldest  younger  sibling  quadtree. 


set  up  tree() 

if  (they  are  of  the  same  color) 
then 

color  of  parent  =  color  of  childrenif(four  blocks 
are  the  same  color  and  the  color  is  not  GRAY) 
then 

son  pointer  of  parent  is  set  to  NIL 
if  (the  block  under  consideration  is  the  root  of  the 
tree) 
then 

stop 

Once  we  reach  the  root  of  the  tree,  the  tree  is  trimmed  -  since  a 
parent  value  equal  to  a  pure  color  (not  GRAY)  represents  an 
entire  block  of  pixels  in  the  image  which  possess  this  particular 
color.  Thus  it  would  make  sense  to  store  only  the  parent  and 
trim  off  the  succeeding  generations  of  children.  Since  the 
images  possess  large  homogeneous  regions  some  of  the  lower 
generations  of  the  tree  can  be  discarded.  In  this  manner  it  is 
possible  to  store  the  entire  image  with  only  a  portion  of  the 
quadtree.  This  results  in  memory  savings. 
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Figure  3.  Numbering  scheme  for  quadtree  nodes. 


A  modified  quadtree  traversal  algorithm  for  the  rooted  quadtree 
is  given  below.  In  [141  a  quadtree  traversal  algorithm  is 
described  for  a  quadtree  with  each  node  having  four  son 
pointers  and  a  parent  pointer.  The  three  pointer  configuration 
of  the  nodes  being  used  necessitate  a  different  traversal 
algorithm.  The  algorithm  used  was  a  modified  version  of  the 
standard  preorder  traversal  algorithm  [7].The  large  number  of 
recursive  calls  to  the  tree  traversal  procedure  could  result  in 
stack  overflow.  To  avoid  this  the  tree  traversal  algorithm  is 
applied  separately  to  each  of  the  four  sub-trees  of  the  root. 
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The  traversal  algorithm  is  described  below  in  pseudo  code 

traverse(node) 

if  (node  has  no  children) 
then 

visit(node) 

if  (node  has  a  sibling) 
then 

traverse(sibling) 
if  (node  has  no  sibling) 
then 

if(parent  of  node  has  sibling) 
then 

traverse(sibling  of  the  parent  of  the 
node) 
else 

traverse(sibling  of  the  “grand” 
parent  of  the  node) 

if(node  is  a  root) 
then 

visit(node) 

traverse(son) 

retum(when  node  is  root  of  the  quadtree) 

In  visit(node)  the  node  number  and  value  of  each  element  was 
written  to  an  output  file. 

The  decoding  algorithm  involves  setting  up  the  quadtree  as  a 
first  step.  The  output  file  of  the  quadtree  compression 
algorithm  consists  of  colors  of  nodes  and  node  numbers  of  the 
portion  of  the  quadtree  that  has  to  be  retained.  These  are  then 
allotted  to  the  appropriate  nodes. 

The  decoding  algorithm  for  quadtree  compression  is  described 
below 

decode  quadtree() 

for  each  level  of  the  quadtree 
if(level  is  not  the  bottom  level) 
then 

for  each  node  in  the  level 
if  (color  of  node  is  a  pure  color) 
then 

children  of  node  have  same  color  as 
parent 

if  (level  is  bottom  level) 
then 

stop 

Run-length  Coding 

The  presence  of  large  homogeneous  regions  in  the  maps 
indicates  that  run-length  coding  would  result  in  good 
compression.  The  presence  of  fine  detail  and  lettering  on  the 
maps  results  in  a  limit  on  the  length  of  the  runs  and  hence  on 
the  compression  attainable.  Since  large  regions  of  the  images 
have  the  same  color,  runs  are  allowed  to  wrap  around  from  one 
row  of  pixels  to  another. 

Run  length  coding  exploits  length  wise  correlations  that  exist 
in  the  maps.  The  coding  algorithm  described  in  [2]  is  used. 

The  operation  of  the  run  length  coding  algorithm  is 
summarized  below 


run  length  coding(row) 

for  each  pixel 

if(next  pixel  in  the  same  row  is  of  the  same  color) 
then 

length  of  run  =  length  of  run  +  1 
if(next  pixel  in  the  same  row  is  not  the  same 
color) 
then 

store  length  of  run  and  color 
if(end  of  row  is  reached) 
then 

continue  run  on  row  +  1 

In  the  maps  under  consideration,  there  are  many  long  runs. 
Substantial  memory  savings  result  from  the  fact  that  only  the 
color  of  the  first  pixel  in  a  run  and  the  length  of  the  run  have  to 
be  stored.  The  unclassified  image  is  a  salt  and  pepper  image 
and  hence  results  in  poor  compression.  Classification  of  the 
images  improves  the  performance  of  the  algorithm. 

Contour  Coding 

The  maps  have  large  regions  of  only  one  color.  Contour 
coding  achieves  memory  savings  by  storing  the  the  perimeter 
and  the  color  of  each  region.  The  algorithm  used  for  contour 
coding  is  the  one  given  by  Wilkins  in[16]. 

Each  contour  is  uniquely  determined  by  the  color  of  the  initial 
point  of  the  contour,  its  location,  and  a  sequence  of  directionals 
that  give  the  direction  of  travel  around  the  contour.  The 
contour  coding  algorithm  used  consists  of  two  parts  :  the  T 
algorithm  that  traces  the  contour  and  the  IP  algorithm  which 
determines  whether  a  pixel  is  the  initial  point  of  a  new  contour. 

The  T  algorithm  traces  contours  based  on  the  LML  (Left  Most 
Looking)  rule  which  is  described  below: 

LML  rule(contour) 

for  each  point 

if(pixel  on  left  has  same  color) 
then 

move  left 
else 

if(pixel  ahead  has  same  color) 
then 

move  ahead 
else 

if(pixel  to  the  right  has  same  color) 
then 

move  right 
else 

move  back 

if(new  point  is  the  initial  point) 
then 

IP  algorithm() 

else 

continue  LML(contour) 

Indicators  are  assigned  to  each  point  to  distinguish  initial 
points  from  points  on  the  interior  of  a  contour.  All  pixels  are 
initially  assigned  an  initial  indicator.  The  indicator  is  then 
reassigned  based  on  the  direction  of  travel  into  and  out  of  the 
pixel.  The  IP  algorithm  begins  at  the  next  pixel  to  the  right  of 
the  initial  point  of  the  last  contour  and  then  scans  it  row  by  row 
for  the  next  initial  point.  The  IP  algorithm  which  locates  initial 
points  of  contours  is  described  on  the  following  page. 
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IP  algorithm() 

if(indicator  is  not  the  same  as  initial  indicator) 
then 

next  pixel 

if(indicator  is  the  initial  indicator  and  color  same  as 

previous  initial  point) 

then 

next  pixel 
else 

pixel  is  initial  point 


Results 

The  unclassified  and  K  means  classified  maps  were 
compressed  using  Lempel-Ziv  coding.  The  unclassified  image 
gave  a  compression  ratio  of  only  1.4:1.  The  K  means  classified 
image  could  be  compressed  by  a  ratio  of  8:1.  A  typical  image 
is  compressed  using  run-length  coding.  The  compression  ratio 
obtained  is  2.8:1.  The  total  compression  time  is  12  seconds 
and  the  decoding  time  is  9  seconds.  The  compression  achieved 
here  is  expected  because  of  the  presence  of  large  homogeneous 
regions.  The  length  of  the  longest  run  is  520  pixels.  The 
output  of  the  run-length  coder  for  the  classified  image  was 
Lempel-Ziv  coded  resulting  in  a  compression  of  6.7:1.  A 
histogram  of  the  lengths  of  the  runs  is  presented  in  Figure  4. 


Figure  4.  Histogram  of  run  lengths. 

Quadtree  coding  of  die  maps  resulted  in  a  compression  of  7: 1 . 
The  output  of  the  quadtree  coding  algorithm  was  then 
compressed  using  Lempel-Ziv  coding.  The  compression  ratio 
achieved  was  10.5:1.  The  coding  and  decoding  times  for 
quadtree  coding  is  of  the  order  of  10  minutes.  All  compression 
ratios  mentioned  above  are  effectively  multiplied  by  a  factor  of 
3  when  the  initial  transformation  from  a  24  bits/pixel  image  to 
a  pseudo-color  image,  is  taken  into  consideration. 


Conclusions 

The  exponential  behavior  of  the  histogram  of  the  runs  indicates 
that  runs  of  short  run  length  have  higher  probability. 
Accordingly  Huffman  coding  of  the  runs  would  result  in  more 
efficient  bit  assignment  and  hence  greater  compression  ratios. 
The  improvement  in  overall  compression  ratio  when 
Lempel-Ziv  coding  is  applied  to  the  output  of  the  run-lenght 
coder  futher  indicates  that  Huffman  coding  of  the  runs  would 
improve  the  compression  ratio.  The  quadtree  coding  approach 
offers  a  higher  compression  ratio  than  run-length  coding.  The 
comparison  of  computation  times  showed  that  run-length 
coding  is  significantly  faster  than  quadtree  coding.  Further 
improvements  will  have  to  be  made  in  quadtree  coding  to 
decrease  computation  time  and  memory  requirement  for  the 
decoding  algorithm.  The  output  of  the  quadtree  coder  uses  32 
bits  to  store  the  node  number  of  each  node.  Further 
compression  can  be  achieved  by  using  only  the  smallest 
number  of  bits  required  to  store  the  node  numbers. 
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